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Abstract: - Lung cancer is one of the significant reasons of death in India. Many diagnoses and detection of lung 
cancer have been made using various data analysis and classification techniques. Since the cause of lung cancer stays 
obscure, prevention becomes impossible; thus, early detection of lung tumors is the only way to cure lung cancer. 
Hence, a lung cancer detection system using image processing and machine learning is used to classify the presence of 
lung cancer in CT- images and blood samples. Despite CT scan reports being more effective than Mammography; 
therefore patient CT scan images are categorized as normal and abnormal. The abnormal images are subjected to 
segmentation to focus on the tumor portion. Classification is done on features extracted from the images. This effective 
method uses SVM and Image Processing techniques to more precisely identify lung cancer and its stages. 
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1. Introduction 


Cancer is one of the diseases that people are 
particularly concerned about today. Mortalities from 
lung cancer are expected to continue rising, reaching 
around 17 million worldwide in 2030. Every sixth death 
in the world is due to cancer, making it the second 
leading cause of death (second only to cardiovascular 
diseases), and lung cancer is one of the most deadly 
diseases. The leading cause is the formation of cancerous 
nodules around the lobe or lung. Therefore, early 
detection of nodules is essential. Because nodules are 
small dots in computed tomographic images, clinicians 
need to examine each image one by one, which is time- 
consuming and leaves a likely possibility of overlooking 
a nodule. Early detection of lung cancer can increase the 
chance of survival among people. There are many 
techniques to diagnose lung cancer, such as Chest 


Radiograph (X-ray), Computed Tomography (CT), 
Magnetic Resonance Imaging (MRI scan), and Sputum 
Cytology. 

However, most of these techniques are expensive 
and time-consuming. Therefore, there is a great need for 
new technology to diagnose lung cancer in its early 
stages. Image processing techniques provide a good 
quality tool for improving manual analysis. Many 
diagnoses and detection of lung cancer have been made 
using various data analysis and classification techniques. 
Since the cause of lung cancer stays obscure, prevention 
becomes impossible. Thus early detection of lung tumors 
is the only way to cure lung cancer. Hence, a lung cancer 
detection system using image processing and machine 
learning is used to classify the presence of lung cancer in 
CT- images and blood samples. Despite CT scan reports 
being more effective than Mammography. Therefore 
patient CT scan images are categorized as normal and 
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abnormal. The abnormal images are subjected to 
segmentation to focus on the tumor portion. 
Classification is done on features extracted from the 
images. The efficient method to successfully detect lung 
cancer and its stages and aim to have more accurate 
results using SVM and Image Processing techniques. 


The rest of the paper is organized as follows, ie., 
Section 2 describes Related Work, Section 3 Presents the 
methodologies, and Section 4 presents results and 
analysis. Finally, Section 5 Concludes the summary of 
the research work. 


2. Related Work 


In this paper[10], some preprocessing image methods 
such as_ thresholding, clearing borders, and 
morphological operations (viz., erosion, closing, 
opening) are discussed to detect lung nodule regions, i.e., 
Region of Interest (ROD) in patient lung CT scan images. 
Also, machine learning techniques such as Support 
Vector Machine (SVM) and Convolutional Neural 
Network (CNN) has been discussed for classifying the 
lung nodules and non-nodules objects in patient lung ct 
scan images using the sets of lung nodule regions. 

In this paper [11], Lung segmentation in Computed 
Tomography (CT) images plays a vital role in 
diagnosing, detecting, and three-dimensionalizing lung 
nodules. In addition, the stability, accuracy, and 
efficiency of lung segmentation in CT images 
significantly impact the performance of Computer-Aided 
Detection (CAD) systems. Lung segmentation is usually 
the first step in lung CT images analysis. This paper 
presents a fully automated algorithm for recognition and 
segmentation of the lung in 3D X-ray images using the 
Active Shape Model (ASM). 

This paper[12] presents a fully automated model for 
NSCLC nodule(s) segmentation from a CT scan image. 
The proposed method follows four steps: (1) 
Preprocessing, (2) Automatic Lung Parenchyma 
Extraction and Border Repair (ALPE&BR), (3) 
Automatic lung nodules segmentation using Connected 
Component Analysis (CCA) and _ Threshold 
BasedMathematical Nodule (TBMN) _ refinement 
algorithm and (4) Nodules filtering using Hounsfield 
Unit (HU) value and valid cancerous nodule extraction. 

This paper [13] proposes an adaptive solution to 
mitigate the difficulty of the thresholding-based method 
in lung segmentation. Many (obvious) FPs inevitably 
accompany sufficient detection power for nodule 
candidates. A rule-based filtering operation is often 
employed to cheaply and drastically reduce the number 
of obvious FPs, so their influence on the computationally 
more expensive learning process can be eliminated. FP 
reduction using machine learning has been extensively 
studied in the literature. Compared with unsupervised 
learning, which aims to find hidden structures in 
unlabelled data, supervised learning, which aims to infer 
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a function from labeled training data, is more frequently 
used to design a CADe system. Compared with the 
existing approaches, morphology-based lung cancer 
detection can be an alternative with either comparable 
detection performance and less computational cost, or 
comparable cost and better detection performance 

As various preprocessing filters are available and 
introduced, the purpose of using Preprocessing is to 
enhance some features and remove unwanted features for 
segmenting the nodule. Segmentation is an important 
factor in image processing which helps to detect cancer 
tissue earlier for treatment. 


3. Methodologies 


The lung CT images have low noise compared to 
scan and MRI images. The main advantage of the 
computer tomography image is having better clarity, 
low noise, and distortion. So we can take the CT 
images for processing the lung image. Then 
segmentation is applied to the lung image. 


_— 


$s 


Figure | Input lung CT image 


The proposed system for lung cancer 
detection in CT images is shown with the help of 
architecture in figure 2. The methodology is carried 
out in five main steps, and each step of this system is 
discussed in detail in the section below 
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Quality Enhanced Image 


Pre-processing 


Cancer Classification 


Figure 2. Block diagram of the Proposed model 


3.1 Data Set 

For the proposed work, the dataset used in the 
Lung Nodule Analysis (LUNA) is derived from Lung 
Image Database Consortium and Image Database 
Research Initiative (LIDC/IDRI) database[14]. Out of 
1024 patient data available, 888 patient data is used to 
exclude the slices of thickness greater than 2.5mm. 
The dataset consists of the DICOM format CT scan 
images shown in MHD and RAW image format. The 
MHD file format is the Meta image based on the 
tagged file format for the medical images, and the 
RAW file contains the processed data from the image 
sensor of either camera or scanner. The dataset 
includes the annotations of the CT scans provided in 
the dataset. These annotations are obtained from the 
expertized radiologists, including all the information 
(diameter, X, Y, Z coordinates of the images) about 
the lung CT images. The dataset LUNA was the 
challenge as part of the 2016 IEEE International 
Symposium on Biomedical Imaging. 


3.2 Nodule Extraction and Preprocessing 

The template is used to format your paper and 
style the text. All margins, column widths, line 
spaces, and text fonts are prescribed; please do not 
alter them. You may note peculiarities. For example, 
the head margin in this template measures 
proportionately more than is customary. This 
measurement and others are deliberate, using 
specifications that anticipate your paper as part of the 
entire proceedings and not as an _ independent 
document. Please do not revise any of the current 
designations. 
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3.3 Image Segmentation 

Separating the required region of interest from the 
image is known as segmentation. Mathematical 
morphological operations are powerful tools in 
acquiring lung regions from binary images. In our 
methodology, the preprocessed grayscale images were 
first converted to binary images. The morphological 
opening operation was performed on the binary image 
with a disk structuring element for the removal of 
unwanted components from the image. After the 
image was opened, it was enhanced, and a clean 
border was applied. As a result of plugging the cracks 
and holes, we obtained the lung masks. The tumour 
region was segmented by an exclusive OR operation 
on the lung mask output and the clear border output. 


3.4 Feature Extraction 

Feature extraction is the essential step that 
transforms input data into required features. This 
stage extracts significant parts of the segmented 
region of interest, which serve as input for classifying 
CT scan images. The size and shape of the tumor 
present in the lungs are estimated by extracting three 
geometrical features. The features are a cancerous 
lung nodule's area, perimeter, and eccentricity. 1. 
Area: This is a scalar quantity that gives total number 
of pixels acquired by cancerous lung nodules. The 
area is evaluated from the binary image by summation 
of pixel areas in the image that are registered with 
value 1. 2. Perimeter: This is a scalar quantity that 
gives the total pixels present at the border of the lung 
tumor. The perimeter is evaluated from the binary 
image by summing the pixels registered with value | 
at the outline of the lung nodule. 3. Eccentricity: This 
metric value is also called irregularity index (I), 
circularity, or roundness. For a circular shape, the 
eccentricity value equals 1, and the value is less than 1 
for any other form. Eccentricity =length of significant 
axis /length of the minor axis 


3.5 Classification 

The Classification stage involves labeling the CT 
scan images as normal or abnormal. In our method 
SVM algorithm will be used to detect lung cancer in 
CT images. SVM classifiers are supervised learning 
models that analyze input data and classify them 
according to the pattern. The SVM classifier builds a 
model using a training dataset and categorizes it into 
two classes. The SVM algorithm then assigns new 
examples of testing datasets to one of the two classes. 
The SVM classifier thus finds the best hyperplane that 
separates the two groups and classifies the lung CT 
images. For the best hyperplane, data points of one 
class are separated from the other by the most 
significant margin between the two classes. 
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4. Results 


The proposed model is then developed in 
MATLAB R2016a. MATLAB is one of the tools for 
research development and analysis [15]. Both 
detection and feature extraction is implemented in 
MATLAB, and classification is implemented using a 
machine learning toolbox. The classification learner 
toolbox aids in developing the trained prediction 
model from the features extracted easily and very fast. 
5 folds cross-validation was used to prevent 
overfitting during the training process. Different 16 
DICOM images from LIDC are used for training the 
classifier, and the result is validated using 5 images 
with total 15 nodules. Database For This Study Was 
Obtained From The Luna Dataset[14]. Figure 2 
Shows The Ct Scan Image Of a Patient Affected By 
Lung Cancer. 


Figure 3. Experimental Illustration 
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further in the preprocessing stage image enhancement 
was done using contrast adjustment. In contrast, 
image intensity values are mapped to the full display 
range of the input data, and the image's contrast is 
enhanced. Figure 2 depicts the contrast enhanced ct 
scan image ffig.2 contrast enhanced image 
segmentation technique divides the input image into 
various parts, thus giving us the region of interest for 
further processing. Morphological operations based 
on the structuring element were used to exact the 
region of interest. The lung masks and tumor region 
were obtained from ct image . 


Figure 3: Bordered corrected output 


these features serve as inputs to the SVM 
classifier for categorizing the images into normal 
(non-cancerous) or abnormal (cancerous).for the 
given sample ct scan image, the extracted lung tumor 
as SVM classifier is used to determine whether the ct 
scan images are normal or abnormal using the 
template are normal or abnormal using the template 


oe 


Figure 4. Detected lung nodule 
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Comparing the accuracy of the proposed model 
with the current model, it can be seen that there is a 
progressive increase in accuracy from 88.4% to 
93.52%. Sensitivity remained the same. Specificity 
increased from 40% to 50%. Features like Area, 
Perimeter, Centroid, Diameter, Eccentricity, and 
Mean Intensity of the Pixels were extracted from the 
detected cancer nodes. Extracted features were used to 
Train the Support vector machine, and a trained model 
was developed. Training time for the classification 
learner app was 5.93 seconds. The classification 
learner app evaluates the prediction time for the 
developed, trained model to be 310 observations per 
second. The Scatter plot of trained model is as below. 


Table 1. Confusion matrix data 


Existing Proposed 
model Model ( CNN) 
Total number of | 23 24 
nodes detected 
Number of True | 21 21 
Positive(TP) 
Number of True | 2 2 
Negative (TN) 
Number of False | 2 3 
Positive (FP) 
Number of False | 0 0 
Negative (FN) 
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Performance Metrics: To measure the performance 
of the proposed model using the following metrics 


Accuracy= (TP+TN)/(TP+TN+FP+EN) 
Sensitivity=TP/(TP+FN) 
Specificity=TN/(TN+FP) 


Therefore, from the above result, we can say 
that our proposed model classifies as benign or 
malignant with accuracy of 93.52%. The classification 
of nodules as malignant or benign which was not 
performed in the best model has been successfully 
implemented. 
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5. Conclusion 


Lung cancer is the most dangerous and 
widespread in the world; according to the stage, the 
discovery of the cancer cells in the lungs indicates that 
detecting this disease plays a significant and essential 
role in avoiding severe stages. Based on the work, 
lung cancer can be seen and classified using the neural 
network. This helps the doctor improve treatment in 
the early stage of cancer and avoid many deaths of 
patients with lung cancer detection in the early stage. 
The average percentage accuracy for the proposed 
system reached 93.52% for detecting and classifying 
lung cancer using CNN. 
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